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Mitochondrial (mt) genome sequences, which often bear introns, have been sampled from phylogenetically diverse 
eukaryotes.Thus, we can anticipate novel insights into intron evolution from previously unstudied mt genomes. We here 
investigated the origins and evolution of three introns in the mt genome of the haptophyte Chrysochromulina sp. NIES- 
1333, which was sequenced completely in this study. All the three introns were characterized as group II, on the basis of 
predicted secondary structure, and the conserved sequence motifs at the 5' and 3' termini. Our comparative studies on 
diverse mt genomes prompt us to propose that the Chrysochromulina mt genome laterally acquired the introns from 
mt genomes in distantly related eukaryotes. Many group II introns harbor intronic open reading frames for the proteins 
(intron-encoded proteins or lEPs), which likely facilitate the splicing of their host introns. However, we propose that a 
"free-standing," lEP-like protein, which is not encoded within any introns in the Chrysochromulina mt genome, is involved 
in the splicing of the first coxl intron that lacks any open reading frames. 



Introduction 

Mitochondrial (mt) genomes can be regarded as a model sys- 
tem for studying intron evolution, as a massive amount of mt 
genome data including group I and/or group II (gll) introns 
have been accumulated (for instance, see NCBI Organelle 
Genome Resource: http://www.ncbi.nlm.nih.gov/genomes/ 
OrganelleResource.cgi?opt=organelle&taxid=2759). Group II 
introns are found in the genomes of prokaryotes (bacteria and 
archaebacteria), as well as mitochondria and plastids, 1,2 which 
are derived from an a-proteobacterium 3 and a cyanobacte- 
rium, 4 respectively. So far, gll introns have been identified in mt 
genomes from members of phylogenetically diverse eukaryotic 
assemblages such as Metazoa, 5,6 Jakobida, 7,8 Archaeplastida, 9 " 11 
Fungi, 1213 Cryptophyta, 14 Haptophyta, 15,16 and Stramenopiles. 17 " 19 
These gll introns possess features at both the primary and sec- 
ondary structure levels. At the primary structure level, gll 
introns possess highly conserved sequence motifs at the 5' and 3' 
ends (i.e., 5'-GTGYG...AY-3'; Y for T or C). 20 At the secondary 
structure level, we anticipate the transcripts of typical gll introns 
(intron RNAs) to form a characteristic bulge structure with six 



stems, so-called domains I to VI. 21 Both primary and secondary 
structures of intron RNAs are most likely critical for the splicing 
reaction. 22 

Group II introns can be regarded as mobile genetic elements, 
which are transmittable between an intron-containing and an 
intron-lacking loci (intron homing), regardless of their evolu- 
tionary distance. The mobility of gll introns are most likely con- 
ferred by the proteins encoded within gll introns (intron-encoded 
proteins or IEPs). Typical IEPs comprise three functionally dis- 
tinct domains, namely i) reverse transcriptases (RT), ii) domain 
X, which is also referred to as maturases, and iii) endonucleases 
(En), 23 although some IEPs were reported to lack En domain. 20,22 
Among the three domains in IEP, RT and En domains are pre- 
dicted to catalyze reverse transcription of intron RNA and digest 
the target (intron-lacking) locus, respectively. 20,23 Domain X 
may not be responsible for intron mobility, but assists splicing 
by stabilizing the conformation of intron RNA. Nevertheless, 
we have known of many "IEP-free" gll introns, and it is dif- 
ficult to predict the protein factors, which cooperate with a 
particular IEP-free intron in trans. To our knowledge, there is 
only a single report that successfully identified the organellar 



Correspondence to: Yuji Inagaki; Email: yuji@ccs.tsukuba.ac.jp 

Submitted: 04/21/2014; Revised: 05/27/2014; Accepted: 05/27/2014; Published Online: 05/27/2014 

Citation: Nishimura Y, Kamikawa R, Hashimoto T, Inagaki Y. An intronic open reading frame was released from one of group II introns in the mitochondrial 
genome of a haptophyte Chrysochromulina sp. NIES-1333. Mobile Genetic Elements 2014; 4:e29384; http://dx.doi.org/10.4161/mge.29384 



www.landesbioscience.com 



Mobile Genetic Elements 



e29384-1 



intron 
(Ch_rnli) rnlb 



fmD(gtc) (mL(tag) 

/ nad2 ,fmM(cau) 



rns fmA(tgc) 

/mS(tga) 
' fmR(tct) 



fmQ(ttg) 



fml(cat) 
frnR(acg) 

coxlc 




intron ■-• 
(Ch_cox1i2) 



nad4-nad5 



fml(cat) 



orf584 



cox2 

fmV(tac) 
fml(gat) 
(ml(cat) 
(mP(tgg) 

rpl16 
-nad4L 



fml(cat) 



fmY(gta) 



fmF(gaa) (mS(gct) fmT(tgt) 



Figure 1. Mitochondrial Genome of Chrysochromulina sp. NIES-1333. Protein-coding genes and rRNA genes are represented by boxes. Gray boxes repre- 
sent two open reading frames, of which amino acid sequences showed significant sequence similarity to intron-encoded proteins. Transfer RNA genes 
are represented by lines. Introns are shown in dotted lines. Arrows represent duplicated regions. 



genome-encoded trans factor involved in the splicing of a IEP- 
free intron. 24 

In this study, we completely sequenced the mt genome of a 
haptophyte Chrysochromulina sp. NIES-1333, and identified 
three introns in total, two of those are found in the coxl gene 
encoding cytochrome c oxidase subunit 1, and the last one is 
found in the ml gene encoding large subunit rRNA. Analyses 
of the intron sequences suggest that the three introns in the 
Chrysochromulina mt genome belonged to group II. We identified 
two open reading frames encoding putative IEPs. Both showed 
significant sequence similarity to gll intron-hosted IEPs in the 
mt genome; one is orf627 encoded in the second coxl intron, and 
the other is orf584, which is free-standing. Phylogenetic analy- 
ses of IEPs and comparisons of intron positions across phyloge- 
netically diverse mt genomes revealed that the Chrysochromulina 
mt genome shares homologous introns with distantly related mt 
genomes. 



Results and Discussion 

Overview of the Chrysochromulina mt genome 
The mt genome of Chrysochromulina sp. NIES-1333 was 
assembled into a circular molecule of 34,291 bp in length with an 
A + T content of 70.0% (Fig. 1). We identified 16 functionally 
assignable open reading frames (including those for two IEPs; 
see below). UGA codons are most likely assigned for tryptophan 
instead of termination signal (Table SI), as reported previously. 25 
This type of mt genetic code was reported in other members of 
Prymnesiophyceae, one of the two classes of Haptophyta. 26 " 28 We 
detected 26 tRNA genes and a set of small and large subunits of 
rRNAs; No 5S rRNA gene was identified. A set of tRNA genes 
identified in the mt genome is sufficient to translate all amino 
acid codons except for GGN (N = A, C, G or U) codons for 
glycine (Table SI). All genes mentioned above were encoded on 
a single strand. A region with an approximate length of 1.6 Kbp, 
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which contained a single tRNA gene for isoleucine, was found to 
be duplicated (shown by open arrows in Fig. 1). 

In terms of gene repertoire, the Chrysochromulina mt genome 
is fundamentally similar to those of other haptophytes, namely 
Emiliania huxleyi? 1 Diacronema lutheri (www.bch.umontreal. 
ca/ogmp/projects/pluth/gen.html), Phaeocystis spp., 12 as shown 
in Table 1. 

General features of the intron in the Chrysochromulina rnl 
gene 

The rnl gene hosts a single intron encoding no apparent 
open reading frame (designated as Ch_rnli; Fig. 1). The intron 
was found to be inserted at the position between the 837th and 
838th bases in the Homo sapiens homolog (GenelD: 4550 in 
NC_012920). Ch_rnli starts with 5-GTGCG... and ends with 
...CT-3', which is similar to the consensus motifs shared among 
typical gll introns (5'-GTGYG. . .AY-3'; Y for T or C). Although 
the intron sequence was too divergent to predict the entire sec- 
ondary structure, we successfully identified the typical domains, 
V and VI, which are characteristic secondary structures of gll 
introns (Fig. SI A), with the aid of MFannot (http://megasun. 
bch.umontreal.ca/cgi-bin/mfannot/mfannotlnterface.pl) . Thus, 
we characterized Ch_rnli as group II. 

The homing position of Ch_rnli was found to be identical to 
those of rnl introns found in a member of Archaeplastida (the 
red alga, Pyropia haitanensis; NC_017751), two members of 
Stramenopiles (the brown alga, Pylaiella littoralis, and the dia- 
tom, Phaeodactylum tricornutum; NC_003055 and HQ840789, 
respectively), and a member of Opisthokonta (the fungus, 
Gigaspora rosea; NC_016985) (Fig. S1A: Note that none of the 
rnl introns found in the mt genomes of other haptophytes, D. 
lutheri and Phaeocystis globosa, shared the homing position with 
Ch_rnli). We predicted the secondary structures of domains V 
and VI in Ch_rnli and the four introns described above, but 
detected no apparent homology at the nucleotide sequence level 
among them (Fig. SIB). Furthermore, Ch_rnli hosts no IEP, 
which is considered a key aspect to inferring intron evolution. 22 
Thus, we hesitate to discuss the evolutionary relationship among 
Ch_rnli and the introns listed above, solely based on their hom- 
ing positions. 

General features of two introns in the Chrysochromulina 
coxl gene 

Two introns were found in the Chrysochromulina coxl gene 
(Fig. 1). We designate the first and second introns in the coxl 
gene as Ch_coxlil and Ch_coxli2, respectively. The two 
introns commonly start with 5'-GTGCG... and end with ... 
AC-3', being consistent with the consensus motifs shared 
among typical gll introns (5'-GTGYG. . .AY-3')- Ch_coxlil was 
inserted between the second and third letters (phase-2) of the 
codon corresponding to Phe 68 in the H. sapiens coxl gene, shar- 
ing the homing position with the gll introns in coxl genes of the 
cryptophyte, Rhodomonas salina, and the diatom, Phaeodactylum 
tricornatum (Fig. S2C). Ch_coxli2 was found at phase-2 of 
the codon corresponding to Phe 237 in the H. sapiens coxl gene, 
being homologous to those of the gll introns in coxl genes of the 
haptophyte D. lutheri and the diatom Ulnaria acus (Fig. S2C). 
Ch_coxli2 hosts an intronic open reading frame for an IEP, 



while Ch_coxlil encodes no apparent open reading frame. Both 
Ch_coxlil and Ch_coxli2 can be folded into the characteristic 
secondary structures shared among gll introns, albeit with some 
ambiguity remaining in domain I (indicated as "DI" in Fig. SIB 
and SIC). All together, we concluded that the two introns 
belong to group II. 

Evolution of Ch_coxli2 and its IEP 

The IEP encoded in the intronic open reading frame of 
Ch_coxli2, ORF627, most likely facilitates splicing of the host 
intron. The ORF627 amino acid sequence showed similarity to 
other gll intron-hosted IEP sequences deposited in the GenBank 
database; the top blastp hit was an IEP encoded in the first gll 
intron of the coxl gene in the haptophyte D. lutheri (Dl_coxli), 
with a 49% sequence similarity and an lvalue of 0.0. In both 
maximum-likelihood (ML) and Bayesian analyses of an IEP 
alignment (Fig. 2), Chrysochromulina ORF627 formed a clade 
with two IEPs encoded in coxl gll introns, namely Dl_coxli and 
that of the diatom U. acus (Ua_coxli) with a ML bootstrap sup- 
port value (MLBP) of 96% and a Bayesian posterior probability 
(BPP) of 1.00 (Fig. 2). As we generally believe that gll introns 
and their IEPs have coevolved, 22 the intimate relationship among 
the IEPs encoded in Ch_coxli2, Dl_coxli, and Ua_coxli sug- 
gests that their host introns are derived from a single ancestral 
intron bearing an IEP. The single origin of Ch_coxli2, Dl_ 
coxli, and Ua_coxli discussed above is consistent with the fact 
that the three introns share a homing position (Fig. S2C). 

The ancestral haptophyte species likely possessed a coxl gene 
with a particular gll intron, as Chrysochromulia sp. and D. lutheri 
are representatives of two major classes, Prymnesiophyceae and 
Pavlovaphyceae, in Haptophyta, respectively. This scenario sug- 
gests that multiple intron losses occurred in the coxl genes of 
Emiliania huxleyi, 2 *' 29 members of the genus Phaeocystis, 12,26 and 
Isochrysis galbana. ia 

The IEP phylogeny and comparison of homing positions 
imply that the homologous introns are present in two distantly 
related branches (haptophytes and diatoms) in the tree of eukary- 
otes. This sporadic intron distribution can be explained by a 
scenario incorporating lateral intron transfer. There is an alter- 
native, but less plausible scenario assuming that the coxl gene in 
an ancestral organism, which has existed prior to the divergence 
of major eukaryotic assemblages including diatoms and hapto- 
phytes, may have already possessed a gll intron at phase-2 of the 
codon corresponding to Phe 237 in the H. sapiens coxl gene, and 
would have been (secondarily) lost in multiple descendants (i.e., 
ancestral co-occurrence followed by multiple secondary losses). 
We prefer the scenario incorporating lateral intron transfer to 
the alternative one, but these scenarios should be reexamined by 
future studies based on a broader diversity of gll introns (and 
their IEPs) compared with those considered in the current study. 

Link between a free-standing orf584 and an IEP-free 
Ch_coxlil 

Most IEPs are encoded in intronic open reading frames (as 
observed in Ch_coxli2; see above), but a few of those are free- 
standing in genomes (e.g., orf/32 in the Marchantia polymorpha 
mt genome; highlighted by a star in Fig. 2). Our blast search 
showed that Chrysochromulina ORF584, which is encoded in a 
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Table 1. Gene repertoires in haptophyte mitochondrial genomes 





Chrysochromulina sp. 


Diacronema lutheri 


Emiliania huxleyi 


Phaeocystis globosa 


Phaeocystis antarica 


rnl 


Y[1] 


Y[1] 


Y 


Y[1] 


Y 


rns 


Y 


Y[1] 


Y 


Y 


Y 


rrn5 


N 


Y 


Y 


N 


N 


trn 


23 species 


22 species 


23 species 


23 species 


23 species 


nad1 


Y 


Y 


Y 


Y 


Y 


nad2 


N 


N 


N 


Y 


Y 


nad3 


N 


N 


N 


Y 


Y 


nad4 


Y 


Y 


Y 


Y 


Y 


nad4L 


Y 


Y 


Y 


Y 


Y 


nad5 


Y 


Y 


Y 


Y 


Y 


nad6 


Y 


Y 


Y 


Y 


Y 


cob 


Y 


Y 


Y 


Y 


Y 


coxl 


Y[2] 


Y[1] 


Y 


Y 


Y 


cox2 


Y 


Y 


Y 


Y 


Y 


cox3 


Y 


Y[1] 


Y 


Y 


Y 


atp4 


N 


Y 


Y 


Y 


Y 


atp6 


Y 


Y[1] 


Y 


Y 


Y 


atp8 


N 


Y 


N 


Y 


Y 


atp9 


Y 


Y[1] 


Y 


Y 


Y 


rps3 


N 


N 


Y 


Y 


Y 


rps8 


N 


N 


Y 


N 


N 


rps12 


Y 


Y 


Y 


Y 


Y 


rps!4 


N 


Y 


Y 


N 


Y 


rps19 


N 


Y 


N 


N 


N 


rplH 


N 


Y 


N 


N 


N 


rplW 


Y 


Y 


Y 


Y 


Y 


dam 


N 


N 


Y 


N 


N 


Others 


orf627* 
orf538 b 


orf636 c 
or f 105" 


orf104" 


N 


N 



Y, yes; N, no. Numbers of introns are shown in brackets. "Encoded in the second coxl intron. b Free-standing open reading frame 
encoding a protein with amino acid sequence similarity to group II intron-encoded proteins. 'Encoded in the coxl intron. d Encodes 
an uncharacterized protein. 



free-standing open reading frame, bore a significant sequence 
similarity to gll intron-hosted IEPs; The top blastp hit of 
ORF584 amino acid sequence was an IEP (ORF724) encoded 
in the first intron of the coxl gene in the diatom, P. tricornu- 
tum (Pt_coxlil) with a 53% sequence similarity and an lvalue 
of 0.0. ORF584 equips En, RT, and domain X, implying that 
this protein assists intron splicing. The phylogenetic analyses of 
the IEP alignment (Fig. 2) recovered a robust affinity between 
Chrysochromulina ORF584 and ORF724 encoded in Pt_coxlil 
with a MLBP of 100% and a BPP of 1.00 (Fig. 2). This indi- 
cates the two proteins were derived from the single ancestral IEP 
encoded in a gll intron, which is homologous to Pt_coxlil, the 
host intron of ORF724. Curiously, Pt_coxlil and Ch_coxlil 
appeared to share a homing position (see Fig. S2C). We also 



noticed that the nucleotide sequence of domain VI in Ch_coxlil 
and that in Pt_coxlil are similar to one another (Fig. SID), 
although this domain sequences are generally variable among 
gll introns. 21 The homing position and sequence similarity in 
domain VI, between Ch_coxlil and Pt_coxlil, suggests that 
the two introns are homologous to each other. All together, we 
here propose that ORF584 used to be encoded in Ch_coxlil, 
and still assists the splicing of the host intron even after being 
free-standing secondarily in the current Chrysochromulina mt 
genome. To the best of our knowledge, co-relation between a 
particular pair of free-standing IEP and IEP-free introns has 
been reported only one time prior to this work. 24 

The first intron in the R. salina coxl gene (Rs_coxlil) is 
unlikely to be homologous to Pt_coxlil or Ch_coxlil, although 
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I Pellia epiphylla coxl intronl [AAZ29196.1] 

i Marchantia polymorphs coxl intronl [NP 054458.1] 



Intron position 

phase-2 Lys 13 



Marchantia polymorpha cob intron [YP_717178.1] 
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i — diatom endosymbiont in Kryptoperidinium foliaceum coxl intron [ABI18636.1] 
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90*- Chattonella marina cox 1 intron2 
— Candida sojae coxl intron [YP 003434263.1] 
I00 r- Saccharomyces cerevisiae coxl intron2 [CAA24060.1] 





<- Kluyveromyces lactis coxl intron [CAA40766.1] 
Saccharomyces cerevisiae cox 1 intronl [AAA67532.1] 

Marchantia polymorpha cox2 intron [AAC09431 .1] 

Marchantia polymorpha 'reverse transcriptase' [NP_054445.1] ■ 
Chlorokybus atmophyticus atp9 intron [YP_001315110.1] 



Rhodomonas salina cox 1 intronl (Rs_cox1i1) [NP 066492.1] 



67] Mi 

85 H^SJ 

I— Snhiy. 



Podospora anserina nad5 intron [NP_074948.1] 

- Venturia inaequalis cob intron [AAB95256.1] 

Schizosaccharomyces pombe cob intron [NP_039503.1] 

Podospora anserina coxl intronl [NP 074925.1] 
■ Podospora comata coxl intron [Z69899] 

Marchantia polymorpha rns intron [NP_054460.1] 
Nitella hyalina cox2 intron [YP_006073027.1] 



phase-1 Phe 68 
phase-1 Asn 55 



phase-2 Phe 6 ' 



phase-1 Gly 49 



100J 

HZZ 



100 



■ Schizosaccharomyces octosporus cox2 intron [AAN31941 .1] 
Schizosaccharomyces pombe cox2 intron [AJ251293.1] 
Podospora anserina coxl intron2 [CAA38781.1] 
Allomyces macrogynus coxl intron [NP 043734.1] 
— Nitella hyalina coxl intron [YP_006073041 .1] 
Schizosaccharomyces pombe coxl intronl [CAB61 571 . 1 ] 

Phaeodactylum tricornutum coxl intronl (Pt_cox1i1) [ADY18509.1] 

Chrysochromulina sp. ORF584 * 



phase-1 Leu 11 
phase-2 Val 121 



phase-2 Phe 61 



HZ 



Cryphonectria parasitica 'putaitve maturase' [AAF27656.1] 
Pellia epiphylla coxl intron2 [AAZ29197.1] 

Marchantia polymorpha coxl intron2 [AAC09454.1] 

Pylaiella littoralis coxl intron2 [NPJ50407.1] 

Thalassiosira nordenskioeldii cox 1 intron 



lUU I 



- Chlorokybus atmophyticus coxl intron [YP 001315141.1] 
■ Pyropia yezoensis cox 1 intron [YP_006280882. 1 ] 
Ulnariaacus coxl intron2 [YP_003359474.1] 
— Pylaiella littoralis cox 1 intronl [NP 150406.1] 
Chattonella marina coxl intronl 
Oltmannsiellopsis viridis coxl intronl [YP 684407.1] 
Rhodomonas salina coxl intron2 [NP 066494.1] 
Neurospora crassa coxl intron [S07649] 
Pyropia haitanensis coxl intron [YP_006234 147.1] 



phase-1 Val 5 ' 
phase-1 Ala 2 ' 

phase-1 Leu 246 



phase-1 Tyr 54 
phase-0 Leu 245 



96 



— Chrysochromulina sp. ORF627 coxi intron2 (Ch_coxii2) 

Diacronema lutheri coxl intron (DI_cox1i) [ADW83098.1] 

Ulnariaacus coxl intronl [YP_003359475.1] 

Marchantia polymorpha atp9 [NP_054463.1] 

Pseudomonas alcaligenes [AAB68949.1] 

^ Bacillus halodurans [BAD18238.1] 

100 i Escherichia coli [ADW79802. 1 ] 

Azotobacter vinelandii [YP_002798550.1] 



phase-2 Phe 2: 



■ 0.3 substitutions/site 



Figure 2. Phylogeny Inferred from 52 Intron-encoded Protein (IEP) Amino Acid Sequences. The IEP alignment was subjected to both maximum-likeli- 
hood (ML) and Bayesian methods. As the two methods reconstructed very similar trees, only ML tree is shown here. The tree is rooted by the bacterial 
sequences. Values at nodes represent ML bootstrap support values greater than 50%. The nodes supported by Bayesian posterior probabilities equal 
to or greater than 0.95 are highlighted by thick lines. The lEPs encoded in cox7 introns are shaded in orange. The detailed homing positions of coxi 
introns are given on the right side of the tree. Codon numbers are based on the Homo sapiens coxi gene (GenBank accession number, YP_003024028). 
Free-standing lEPs are highlighted with stars. 
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the three introns share the homing position (Fig. SIC). The IEP 
phylogeny placed the IEP encoded in Rs_coxlil in a remote 
position from the clade of ORF584 and ORF724 (Fig. 2), 
strongly arguing against the homology between Rs_coxlil and 
Pt_coxlil/Ch_coxlil. The homology between Pt_coxlil and 
Ch_coxlil, which were found in two phylogenetically distantly 
related species (i.e., a haptophyte and a diatom), can be explained 
by lateral intron transfer. Nonetheless, we cannot exclude the 
alternative possibility assuming ancestral co-occurrence followed 
by multiple secondary losses. We prefer the simplicity of the first 
scenario incorporating lateral intron transfer, but the alternative 
scenario should not be ignored before mt genome diversity is suf- 
ficiently covered. 

Materials and Methods 

Cell culture and mt genome sequencing 

The haptophyte Chrysochromulina sp. NIES-1333 was pur- 
chased from the National Institute for Environmental Study (NIES; 
16-2 Onogawa, Tsukuba, Ibaraki 305-8506, Japan). Haptophyte 
cells were grown in f/2 medium (http://mcc.nies.go.jp/02medium. 
html#f2) at 20 °C under 14 h light/10 h dark cycles. The cultured 
cells were harvested by centrifugation. Total DNA and total RNA 
were extracted from the harvested cells by CTAB buffer 31 and 
TRIzol (Invitrogen), respectively. Total RNA was used to synthe- 
size cDNA with random hexamers and Superscript II reverse tran- 
scriptase (Invitrogen). RNA extraction and cDNA synthesis were 
conducted following manufacturers' protocols. 

We amplified the entire mt genome by combination of LA 
PCR with TaKaRa LA Taq DNA polymerase (TaKaRa), genome 
walking with the GenomeWalker Universal kit (Clonthech), and 
rolling circle amplification (RCA) with the illustra TempliPhi 
100 Amplification kit (GE Healthcare Life Sciences). Amplified 
DNA fragments of < 3 Kbp-long and those of < 10 Kbp-long were 
cloned to pGEM T-easy vector (Promega) and pCR-TOPO-XL 
(Invitrogen), respectively. The short amplicons (< 3 Kbp-long) 
were sequenced by the Sanger method. 454 pyro-sequencing by 
the GS-Jr system (454 Sequencing, Roche) was performed on the 
long amplicons (> 10 Kbp). Newbler (454 Sequencing, Roche) 
was used for de novo assembly of the pyro-sequencing reads. 
The DNA amplifications and sequencing described above were 
conducted by following manufactures' instructions, except the 
RCA with custom primers instead of random hexamers supplied 
in the kit, as described in Kamikawa et al. (2014). 32 The custom 
primers used for the RCA were designed based on the cob and 
cox3 sequences determined previously. 25 All the DNA sequences 
obtained were finally assembled into a circular molecule, with 
an approximate length of 34 Kbp. The complete mt genome 
sequence was deposited to DDBJ/EMBL/GenBank accession 
number AB930144. 

Genome analyses 

Genes encoding proteins and rRNAs were identified by 
blastx and blastn searches, 33 respectively, against the non- 
redundant database in National Center for Biotechnology 
Information (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Transfer 



RNA-encoding genes were found by using tRNAscan-SE. 34 
Independent from the analyses described above, we re-annotated 
the genome by MFannot (http://megasun.bch.umontreal.ca/cgi- 
bin/mfannot/mfannotlnterface.pl) . 

During the annotation of the Chrysochromulina mt genome, 
we noticed that both coxl and rnl genes are intervened by 
introns. The precise boundaries of the introns in the mt genome 
were confirmed by sequencing the corresponding transcripts 
(cDNAs). The secondary structures of the introns identified in 
the mt genome were predicted by MFOLD, 35 followed by man- 
ual refinement by referring to Toor et al. (2001) 21 and GOBASE 
database. 36 

Phylogenetic analyses 

We found that orf627 and orf584 in the Chrysochromulina 
mt genome encode IEPs (see above). The conceptual amino acid 
sequences of the two IEPs were aligned with 46 IEPs encoded 
in other mt genomes and four bacterial homologs by Muscle. 37 
The IEP sequences were retrieved from the GenBank database 
by referring to pioneering phylogenetic studies (e.g., refs. 17 
and 18). After manual refinements and exclusion of ambigu- 
ously aligned positions, the final alignment, including 52 IEP 
sequences with 453 amino acid positions, was used for phyloge- 
netic analyses. 

The alignment was subjected to both ML and Bayesian meth- 
ods using RAxML7.2.6 38 and PhyloBayes3.3, 39 respectively. We 
applied LG amino acid substitution model 40 with among-site rate 
variation approximated by a discrete gamma distribution with 
four rate categories to ML analyses (LG + T + F) of the origi- 
nal alignment and 100 bootstrap replicates. The ML tree was 
selected by heuristic searches from 10 randomized maximum- 
parsimony (MP) starting trees. In ML bootstrap analysis, heu- 
ristic tree search was performed from a single MP starting tree 
per replicate. Bayesian analysis was conducted with the LG + 
r+ F model. Two independent Monte Carlo chains were run 
for 5,800-5,850 cycles, reaching maxdiff value of 0.08353. The 
first 100 cycles were discarded as "burn-in"; the consensus tree, 
branch lengths, and BPPs was calculated from remaining trees. 
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